Extracting textual information from images and videos for automatic content-based annotation and retrieval
نویسنده
چکیده
One way to utilize semantic knowledge for annotating databases of digital images and videos is to use the textual information which is present. Usually, it provides important information about the content and is a very good entity for queries based on keywords. In this context, the extraction of scene and artificial text from images and videos is an important research problem, with the aim of achieving automatic content-based retrieval and summarization of the visual information. The process of text extraction includes several steps: • Text detection is aimed at identifying image parts containing text. • Text localization merges text regions which belong to the same text candidate and determines the exact text positions. • Text tracking tracks the localized text over successive frames in a video. • Text segmentation and binarization include the separation of the localized text from the image background. The output of this step is a binary image where black text characters appear on a white background. • Character recognition performs optical character recognition (OCR) on the binarized image and converts the binarized image to ASCII text. In this thesis, a robust system for automatically extracting text appearing in images and videos with complex background is presented. Different algorithms are proposed addressing solutions to different steps of the text extraction process mentioned above. The system can operate on JPEG images and MPEG-1 videos. The tracking of the text appearing in videos is also addressed and a novel algorithm is presented. Individual and comparative experimental results demonstrate the performance of the proposed algorithms for the main processing steps: text detection, localization and segmentation, and in particular, their combination. Text in images or videos can appear in different scripts, such as Latin, Ideographic, Arabic, etc. The identification of the used script can help in improving the segmentation results and in increasing the accuracy of OCR by choosing the appropriate algorithms. Thus, a novel technique for script recognition in complex images is presented. Content-based media retrieval has received a lot of attention during the last years and query by example is the most used methodology. In this context, it may be of interest to search for images of video frames where a text visually similar with the input text image appears. Thus, a novel technique that deals with the holistic comparison of text images is proposed. Recently, relevance feedback methods have attracted researchers due to the possibility they offer to interact with the user to increase the performance of a content-based image retrieval (CBIR) system. However, due to the increasing number of images and the need of the user to explore the media before taking a decision, the employment of techniques to visualize or browse a collection of images is becoming important. Consequently, several visualization/browsing methods are proposed to facilitate the interactive exploratory analysis of large image data sets and assist the user during the semantic search.
منابع مشابه
Tags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملFuzzy Neighbor Voting for Automatic Image Annotation
With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...
متن کاملImage Annotation System Using Visual and Textual Features
We present an automated Image Annotation system called I-Tag which uses both visual and textual information of the images and recommends relevant tags for them. The automatic generation of metadata would allow image searches and content-based image retrieval (CBIR) to be more effective. We use state of the art tools on text based retrieval and image content based retrieval to retrieve similar i...
متن کاملSemantic-Based Cross-Media Image Retrieval
In this paper, we propose a novel method for cross-media semantic-based information retrieval, which combines classical textbased and content-based image retrieval techniques. This semantic-based approach aims at determining the strong relationships between keywords (in the caption) and types of visual features associated with its typical images. These relationships are then used to retrieve im...
متن کاملApplication of MPEG-7 descriptors for content-based indexing of sports videos
The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007